Hadoop Introduction
- Introduction to Data and System
- Types of Data
- Traditional way of dealing large data and its problems
- Types of Systems & Scaling
- What is Big Data
- Challenges in Big Data
- Challenges in Traditional Application
- New Requirements
- What is Hadoop? Why Hadoop?
- Brief history of Hadoop
- Features of Hadoop
- Hadoop and RDBMS
- Hadoop Ecosystem’s overview
Hadoop Installation
- Installation in detail
- Creating Ubuntu image in VMware
- Downloading Hadoop
- Installing SSH
- Configuring Hadoop, HDFS & MapReduce
- Download, Installation & Configuration Hive
- Download, Installation & Configuration Pig
- Download, Installation & Configuration Sqoop
- Download, Installation & Configuration Hive
- Configuring Hadoop in Different Modes
Hadoop Distributed File System (HDFS)
- File System - Concepts
- Blocks
- Replication Factor
- Version File
- Safe mode
- Namespace IDs
- Purpose of Name Node
- Purpose of Data Node
- Purpose of Secondary Name Node
- Purpose of Job Tracker
- Purpose of Task Tracker
- HDFS Shell Commands – copy, delete, create directories etc.
- Reading and Writing in HDFS
- Difference of Unix Commands and HDFS commands
- Hadoop Admin Commands
- Hands on exercise with Unix and HDFS commands
- Read / Write in HDFS – Internal Process between Client, NameNode & DataNodes
- Accessing HDFS using Java API
- Various Ways of Accessing HDFS
- Understanding HDFS Java classes and methods
- Commissioning / DeCommissioning DataNode
- Balancer
- Replication Policy
- Network Distance / Topology Script
Map Reduce Programming
- About MapReduce
- Understanding block and input splits
- MapReduce Data types
- Understanding Writable
- Data Flow in MapReduce Application
- Understanding MapReduce problem on datasets
- MapReduce and Functional Programming
- Writing MapReduce Application
- Understanding Mapper function
- Understanding Reducer Function
- Understanding Driver
- Usage of Combiner
- Usage of Distributed Cache
- Passing the parameters to mapper and reducer
- Analysing the Results
- Log files
- Input Formats and Output Formats
- Counters, Skipping Bad and unwanted Records
- Writing Join’s in MapReduce with 2 Input files. Join Types
- Execute MapReduce Job - Insights
- Exercise’s on MapReduce